In-class Exercise 1: My First Date with Quarto

Dr. Kam Tin Seong
Assoc. Professor of Information Systems

School of Computing and Information Systems,
Singapore Management University

21 Feb 2023

Content

  • Introduction to Quarto
  • What is Reproducible Research
  • A brief introduction to R and RStudio
  • Introduction to Quarto
  • Building the course webpage by using Quarto
  • Doing Data Science with tidyverse

What is Reproducible Research?

  • Research is considered to be reproducible when the exact results can be reproduced if given access to the original data, software, or code.

  • Reproducible research is sometimes known as reproducibility, reproducible statistical analysis, reproducible data analysis, reproducible reporting, and literate programming.

Source: https://www.displayr.com/what-is-reproducible-research/

Why Research need to be Reproducible?

  • According to a Nature, more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.

What needs to be reproduced?

The “what” that needs to be reproduced is typically:

  • Actual results themselves, which includes:
    • Tables

    • Visualizations/figures/graphs

    • Values reported in the text

  • The statistical evidence in support of the findings (e.g., p-values, confidence intervals, credible intervals).

Ten simple rules for reproducible research

  • For every result, keep track of how it was produced.
  • Avoid manual data manipulation steps.
  • Archive the excat versions of all external programmes used.
  • Version control all custom scripts.
  • Record all intermedia results, when possible in standard formats.
  • For analysis that include randomness, note underlying random seeds.
  • Always store raw data behind plots.
  • Generate hierarchical analysis output, allowing layers of increasing detail to be inspected.
  • Connect textual statements to underlying results.
  • Provide public access to scripts, runs, and results.

R: A very brief introduction

  • R is a powerful language and environment for statistical computing and graphics. It is a re-implementation of the S language, which was developed in the 1980’s.

  • R is a high level language. The core language has some superficial similarities to C, but many things are handled automatically in R that are not in C.

  • It is a free and open source software (FOSS) under the terms of the Free Software Foundation’s GNU General Public License in source code form.

  • It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

  • It is available from The Comprehensive R Archive Network

The R environment

  • An effective data handling and storage facility,

  • A suite of operators for calculations on arrays, in particular matrices,

  • A large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

  • It is highly extensible and it has thousands of well-documented extensions (named R packages) for a very broad range of applications areas such as finance, business, economic, biostatistics and etc (As of August 2022 ~18471 R packages).

  • It has a vast community both in academia and in business such as stack overflow and RStudio Community.

The CRAN Task View

Installing and Configurating R

  • Download R installer by visiting one of the following links:

  • Install R by clicking on the installer. If necessary, provide the installer administrator right. Install R in the root directory when prompted.

  • After the installation completed, check the environment variable of your computer. If R path is not defined, you should update the path manually.

Introducing Rtools

  • A toolchain bundle used for building R packages from source (those that need compilation of C/C++ or Fortran code) and for build R itself.

  • Download RTools from this site.

  • After the installation complete, check the environment variable of your computer to ensure that RTools path is there.

Introducing R Studio

  • A free and open-source integrated development environment (IDE) for R.

Introducing Quarto

  • Quarto is a new open-source scientific and technical publishing system designed and developed by posit formally known as RStudio.

Introducing Quarto

  • Quarto documents are authored using markdown, an easy to write plain text format.

  • Quarto documents are generated using Pandoc, a universal document converter.

Introducing Quarto

Introducing Quarto

  • A variety of extensions to Pandoc markdown useful for technical writing including cross-references, sub-figures, layout panels, hoverable citations and footnotes, callouts, and more.

  • A project system for rendering groups of documents at once, sharing options across documents, and producing aggregate output like websites and books.

  • Authoring using a wide variety of editors and notebooks including JupyterLab, RStudio, and VS Code.

  • A visual markdown editor that provides a productive writing interface for composing long-form documents.

Learn more about Quarto at https://quarto.org.

Downloading and Installing Quarto

  • Visit this site and download the Quarto of your choice.

It’s Time to Get Your Hands Dirty!

To Learn more

Markdown Basics

Figures

Tables

Diagrams

Callout Blocks

Article Layout

RStudio IDE

Quarto Guide